Competitive evaluation of automated reasoning tools: Empirical scoring and statistical testing

نویسندگان

  • Massimo Narizzano
  • Luca Pulina
  • Armando Tacchella
چکیده

Empirical scoring is the most common ranking method in automated reasoning systems competitions. Statistical testing can be used to validate the results of scoring, since the null hypothesis of equal performances is tested against the alternative hypothesis of signi cant di erence in performances using a precise mathematical formulation. This paper evaluates the merits of statistical testing as a complement to empirical scoring using the 2005 comparative evaluation of solvers for quanti ed Boolean formulas as a case study.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Which system should I buy? A case study about the QBF solvers competition

Systems competitions play a fundamental role in the advancement of the state of the art in several automated reasoning fields. The goal of such events is to answer the question: “Which system should I buy?”. Usually the answer comes as the byproduct of a ranking obtained by considering a pool of problem instances and then aggregating the performances of the systems on each member of the pool. E...

متن کامل

Interaction of reasoning ability and training intervention in reaction to training evaluation and post training effectiveness.

It has been shown that learners' abilities interact with the type of training intervention and effect on training and its outcomes. For this reason, the current research investigated the interaction of reasoning ability with two training methods, namely deductive and empirical methods, in effect on  reaction to training evaluation and post training effectiveness. This research was an applied an...

متن کامل

A Automated Deduction and Usability Reasoning

Building systems that are correct by design has always been a major challenge of software development. Typical software development approaches (and in particular interactive systems development approaches) are based around the notion of prototyping and testing. However, except for simple systems, testing cannot guarantee absence of errors, and, in the case of interactive systems, testing with r...

متن کامل

Interaction of reasoning ability and training intervention in reaction to training evaluation and post training effectiveness.

It has been shown that learners' abilities interact with the type of training intervention and effect on training and its outcomes. For this reason, the current research investigated the interaction of reasoning ability with two training methods, namely deductive and empirical methods, in effect on  reaction to training evaluation and post training effectiveness. This research was an applied an...

متن کامل

Semi-quantitative segmental perfusion scoring in myocardial perfusion SPECT: visual vs. automated analysis

Introduction: It is recommended that the physician apply at least a semi-quantitative segmental scoring system in myocardial perfusion SPECT.  We aimed to assess the agreement between automated semi-quantitative analysis using QPS (quantitative Perfusion SPECT) software and visual approach for calculation of summed stress  score (SSS), summed rest score (SRS) and summed difference score (SDS). ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006